Robust estimation of speech in noisy backgrounds based on aspects of the auditory process.
نویسندگان
چکیده
A new approach to speech enhancement is proposed where constraints based on aspects of the auditory process augment an iterative enhancement framework. The basic enhancement framework is based on a previously developed dual-channel scenario using a two-step iterative Wiener filtering algorithm. Constraints across broad speech sections and over iterations are then experimentally developed on a novel auditory representation derived by transforming the speech magnitude spectrum. The spectral transformations are based on modeling aspects of the human auditory process which include critical band filtering, intensity-to-loudness conversion, and lateral inhibition. The auditory transformations and perceptual based constraints are shown to result in a new set of auditory constrained and enhanced linear prediction (ACE-LP) parameters. The ACE-LP based speech spectrum is then incorporated into the iterative Wiener filtering framework. The improvements due to auditory constraints are demonstrated in several areas. The proposed auditory representation is shown to result in improved spectral characterization in background noise. The auditory constrained iterative enhancement (ACE-II) algorithm is shown to result in improved quality over all sections of enhanced speech. Adaptation of auditory based constraints to changing spectral characteristics over broad classes of speech is another novel aspect of the proposed algorithm. The consistency of speech quality improvement for the ACE-II algorithm is illustrated over time and across all phonemes classified over a large set of phonetically balanced sentences from the TIMIT database. This study demonstrates the application of auditory based perceptual properties of a human listener to speech enhancement in noise, resulting in improved and consistent speech quality over all regions of speech.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 97 6 شماره
صفحات -
تاریخ انتشار 1995